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Abstract Given a permutation tt, the application of prefix reversal /^'^ to tt reverses the 
order of the first i elements of tt. The problem of Sorting By Prefix Reversals (also known 
as pancake flipping), made famous by Gates and Papadimitriou (Bounds for sorting by 
prefix reversal. Discrete Mathematics 27, pp. 47-57), asks for the minimum number of prefix 
reversals required to sort the elements of a given permutation. In this paper we study a 
variant of this problem where the prefix reversals act not on permutations but on strings 
over a fixed size alphabet. We determine the minimum number of prefix reversals required to 
sort binary and ternary strings, with polynomial-time algorithms for these sorting problems 
as a result; demonstrate that computing the minimum prefix reversal distance between two 
binary strings is NP-hard; give an exact expression for the prefix reversal diameter of binary 
strings, and give bounds on the prefix reversal diameter of ternary strings. We also consider 
a weaker form of sorting called grouping (of identical symbols) and give polynomial-time 
algorithms for optimally grouping binary and ternary strings. A number of intriguing open 
problems are also discussed. 

1 Introduction 

For a permutation tt — 7r(0)7r(l) . . . 7r(n— 1) the application of prefix reversal f^^\ which we call flip 
for short, to tt reverses the order of the first i elements: (tt) — 7r(z — 1) . . . 7r(0)7r(i) . . . 7r(ri — 1). 
The problem of Sorting By Prefix Reversals (MIN-SBPR), brought to popularity by Gates and 
Papadimitriou [5] and often referred to as the pancake flipping problem, is defined as follows: given 
a permutation tt of {0, 1, . . . , n — 1}, determine its sorting distance i.e. the smallest number of flips 
required to transform tt into the identity permutation 01 ... (n — 1).^ 

MIN-SBPR has practical relevance in the area of efficient network design [lUlllj . and arises 
in the context of computational biology when seeking to explain the genetic difference between 
two given species by the most parsimonious (i.e. shortest) sequence of gene rearrangements. The 
computational complexity of MIN-SBPR remains open. A recent 2-approximation algorithm |S] 
is currently the best-known approximation result^. Indeed, most studies to date have focused 
not on the computational complexity of MIN-SBPR but rather on determining the worst-case 
sorting distance wc{n) over all length-n permutations i.e. the "worst case scenario" for length-n 
permutations. From IH1 and IIF we know that (15/14)n < wc{n) < {5n + 5)/3. 

A natural variant of MIN-SBPR is to consider the action of flips not on permutations but on 
strings over fixed size alphabets. The shift from permutations to strings alters the problem universe 
somewhat. With permutations, for example, the distance problem, i.e. given two permutations tti 
and TT2 , determine the smallest number of flips required to transform tti into 7r2 , is equivalent to 
sorting, because the symbols can simply be relabelled to make either permutation equal to the 
identity permutation. For strings like 101, such a relabelling is not possible. Thus, the distance 
problem on string pairs appears to be strictly more general than the sorting problem on strings, 
naturally defined as putting all elements in non-descending order. 

* This research has been funded by the Dutch BSIK/BRICKS project. 
^ We adopt the convention of numbering from rather than 1. 

^ Although not explicitly described as such, the algorithm provided ten years earlier in |0| is a 2- 
approximation algorithm for the signed version of the problem. 



Indeed, papers by Christie and Irving [2 and Radcliffe, Scott and Wilmer ^21 explore the 
consequences of switching from permutations to strings; they both consider arbitrary (substring) 
reversals, and transpositions (where two adjacent substrings are swapped.) It has been noted 
that, viewed as a whole, such rearrangement operations on strings have bearing on the study 
of orthologous gene assignment 0, especially where the level of symbol repetition in the strings 
is low. There is also a somewhat surprising link with the relatively unexplored family of string 
partitioning problems [5]. To put our work in context, we briefly describe the most relevant (for 
this paper) results from [5] and [T^ . 

The earlier paper [5], gives, in both the case of reversals and transpositions, polynomial-time 
algorithms for computing the minimum number of operations to sort a given binary string, as well 
as exact, constructive diameter results on binary strings. Additionally, their proof that computing 
the reversal distance between strings is NP-hard, supports the intuition that distance problems 
are harder than sorting problems on strings. They present upper and lower bounds for computing 
reversal and transposition distance on binary strings. 

The more recent paper |12| gives refined and generalised reversal diameter results for non- 
fixed size alphabets. It also gives a polynomial-time algorithm for optimally sorting a ternary 
(3 letter alphabet) string with reversals. The authors refer to the prefix reversal counterparts of 
these (and other) results as interesting open problems. They further provide an alternative proof 
of Christie and Irving's NP-hardness result for reversals, and sketch a proof that computing the 
transposition distance between binary strings is NP-hard. As we later note, this proof can also be 
used to obtain a specific reducibility result for prefix reversals. They also have some first results 
on approximation (giving a PTAS - a Polynomial- Time Approximation Scheme - for computing 
the distance between dense instances) and on the distance between random strings, both of which 
apply to prefix reversals as well. 

In this paper we supplement results of and |E] by their counterparts on prefix reversals. 
In Section^ (Grouping) we introduce a weaker form of sorting where identical symbols need only 
be grouped together, while the groups can be in any order. For grouping on binary and ternary 
strings we give a complete characterisation of the minimum number of flips required to group a 
string, and provide polynomial-time algorithms for computing such an optimal sequence of flips. 
(The complexity of grouping over larger fixed size alphabets remains open but as an intermediate 
result we describe how a PTAS can be constructed for each such problem.) Grouping aids in 
developing a deeper understanding of sorting which is why we tackle it first. It was also mentioned 
as a problem of interest in its own right by Eriksson et al. Then, in Section 0] (Sorting), 
we give polynomial-time algorithms (again based on a complete characterisation) for optimally 
sorting binary and ternary strings with flips. (The complexity of sorting also remains open for 
larger fixed size alphabets. As with grouping we thus provide, as an intermediate result, a PTAS 
for each such problem.) In Section[51we show that the flip diameter on binary strings is n — 1, and 
on ternary strings (for n > 3) lies somewhere between n — 1 and (4/3)n, with empirical support for 
the former. In SectionElwe show that the flip distance problem on binary strings is NP-hard, and 
point out that a reduction in ^12^ also applies to prefix reversals, showing that the flip distance 
problem on arbitrary strings is polynomial-time reducible (in an approximation-preserving sense) 
to the binary problem. We conclude in Section [71 with a discussion of some of the intriguing open 
problems that have emerged during this work. Indeed, our initial exploration has identified many 
basic (yet surprisingly difficult) combinatorial problems that deserve further analysis. 

2 Preliminaries 

Let [k] denote the first k non-negative integers {0, 1, fc — 1}. A fc-ary string is a string over the 
alphabet [fc], while a string s is said to be fully fc-ary, or to have arity fc, if the set of symbols 
occuring in it is [fc]. 

We index the symbols in a string s of length n from 1 through n: s — siS2 ■ ■ ■ Sn- Two strings 
are compatible if they have the same symbol frequencies (and hence the same length), e.g. 0012 
and 1002 are compatible but 0012 and 0112 are not. For a given string s, let I{s) be the string 
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obtained by sorting the symbols of s in non-descending order e.g. /(1022011) — 0011122. The 
prefix reversal (flip for short) /'■*-' (s) reverses the length i prefix of its argument, which should 
have length at least i. Alternatively, we denote application of (s) by underlining the length i 
prefix. Thus, /(2)(2012) = 2012 = 0212 and /(3)(2012) = 2012 = 1022. The flip distance d(s,s') 
between two strings s and s' is defined as the smallest number of flips required to transform s 
into s', if they are compatible and oo otherwise. Since a flip is its own inverse, flip distance is 
symmetric. 

The flip sorting distance ds{s) = d{s,I{s)) of a string s is defined as the number of flips of an 
optimal sorting sequence to transform s into I{s). An algorithm sorts s optimally if it computes 
an optimal sorting sequence for s. 

In the next two sections we consider strings to be equivalent if one can be transformed into the 
other by repeatedly duplicating symbols and eliminating one of two adjacent identical symbols. 
As representatives of the equivalence classes we take the shortest string in each class. These are 
exactly the strings in which adjacent symbols always differ. We express all flip operations in terms 
of these normalized strings. E.g. we write /'^^^(2012) = 2012 — 102. A flip that brings two identical 
symbols together, thereby shortening the string by 1, is called a 1-flip, while all others, that leave 
the string length invariant, are called 0- flips. 

We follow the standard notation for regular expressions: Superindex * on a substring denotes 
the number of repetitions of the substring, with * and denoting 0-or-more and 1-or-more repe- 
titions, respectively, e denotes the empty string, brackets of the form {} are used to denote that 
a symbol can be exactly one of the elements within the brackets, and the product sign Y[ denotes 
concatenation of an indexed series. For example nLi(10'2) = 102100210002, and {l,01}*{e,0} 
denotes the set of binary strings with no 00 substring. 

3 Grouping 

The task of sorting a string can be broken down into two subproblems: grouping identical symbols 
together and putting the groups of identical symbols in the right order. Notice that first grouping 
and then ordering may not be the most efficient way to sort strings. Although grouping appears 
to be slightly easier than the sorting problem, essentially the same questions remain open as in 
sorting. Grouping binary strings is trivial and in Section ITTI we give the grouping distances of all 
ternary strings. As a result we give polynomial time algorithms for binary and ternary grouping. 
For larger alphabets the grouping problem remains open; as an intermediate result we describe in 
Section IX^ a PTAS for each such problem. While the problems of grouping and sorting are closely 
related for strings on small alphabets, the problems diverge when alphabet size approaches the 
string length, with permutations being the limit. 

Recall that we consider only normalized strings, as representatives of equivalence classes. The 
flip grouping distance dg{s) of a fully fc-ary string s is defined as the minimum number of flips 
required to reduce the string to one of length k. 

3.1 Grouping binary and ternary strings 

Lemma 1. dg{s) > n ~ k for any fully k-ary string s of length n. 

Proof. The proof follows from the observations that, after grouping, fully fc-ary string s has length 
k and that each flip can shorten s by at most 1. □ 

Lemma 2. dg{s) < n — 2 for any fully k-ary string s of length n. 

Proof. Consider the following simple algorithm. If the leading symbol occurs elsewhere then a 
1-flip bringing them together exists, so perform this 1-flip. If not, then we use a 0-flip to put this 
symbol in front of a suffix in which we accumulate uniquely appearing symbols. Repeat until the 
string is grouped. 



3 



Clearly no more than n — k l-flips will be necessary. Also, no more than fc — 2 O-flips will 
ever be necessary, because after k — 2 O-flips the prefix of the string will consist of only two 
types of symbol, and the algorithm will never perform a 0-move on such a string. Thus at most 
(n — fc) + (fc — 2) = n — 2 flips in total will be needed. □ 

As a corollary we obtain the grouping distance of binary strings. 
Theorem 1. dg{s) = n — 2 for any fully binary string s of length n. □ 

We will now deflne a class of bad ternary strings and prove that these are the only ternary strings 
that need n — 2 rather than n — 3 flips to be grouped. 

Definition 1. We define bad strings as all fully ternary strings of one of the following types, up 
to relabeling: 

I. strings of length greater than 3, in which the leading symbol appears only once: 0(12)-^ and 
02(12)+ 

//. strings having identical symbols at every other position, starting from the last: ({0, 1}2)+ and 
(2{0,l})+2 

///. odd length strings whose leading symbol appears exactly once more, at an even position, and 

both occurrences are followed by the same symbol: 0(21)+02(12)* 
IV. the following strings: 

Xi = 210212, Aa 021012, A3 = 0120212, A4 = 1201212, A5 = 02101212, Ag = 20210212, 

X^ ^ 020210212, Ag = 120120212. 

All other fully ternary strings are good. Strings of type I, II and III, shortly I-, II-, and Ill-strings, 
respectively, are called generically bad, or g-bad for short. 

Lemma 3. dg{s) ~ n — 2 if ternary string s of length n is bad. 

Proof. Because of Lemmas ^ and |21 it suffices to show that in each case a 0-flip is necessary: 
I-strings admit only O-flips. A 1-flip on a ILstring leads to a ILstring and eventually to a I-string. 
Any IILstring admits only one 1-fiip leading to a Il-string. For IV-strings, Table^shows that each 
possible 1-fiip leads to either a shorter IV-string, or to a I-,n-, or IILstring. □ 





Xe 


210212 = 01212 is of type I 
210212 = 12012 is of type III 


20210212 = 0210212 is of type III 
20210212 = 0120212 = A3 


X2 


20210212 = 1201202 is of type III 


021012 = 12012 is of type III 


X7 


A3 


020210212 = 20210212 = Xe 
020210212 = 12020212 is of type II 


0120212 = 210212 = Xi 


X4 


Xs 


1201212 = 021212 is of type I and II 
1201212 = 210212 = Xi 


120120212 = 02120212 is of type II 
120120212 = 20210212 = Xa 


Xs 




02101212 = 1201212 = X4 



Tablel. type IV strings and all their 1-fiips. 
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Lemma 4. dg(s) = rt — 3 if ternary string s of length n is good. 

Proof. The proof is by induction on n. The induction basis for n = 3 is triviaL We show the 
statement for strings of length n + 1 by showing that if a bad string s' of length n can be obtained 
through a 1-flip from a good (parent) string s of length n + 1, then s admits another 1-flip which 
leads to a good string. Note that a 1-flip /'■''(s) = s' brings symbols si and s^+i together, hence 
si — Si+i Si = s'^ which shows that the symbol deleted from parent s differs from the leading 
symbol of child s' . We enumerate all possible bad child strings s' and distinguish cases based on 
the leading symbol of good parent s. 

For IV-strings, Table [21 lists all parents with, for each good parent, a 1-flip to a good string. It 
remains to prove that for each g-bad string all parents are either bad or have a g-l-flip, defined as 
a 1-flip resulting in a string that is not g-bad (i.e. either good or of type IV). 

Type I, odd: 0(12)-^ has possible parents starting with: 
1: l(21)'012(12p with i j > 0: 

If i > there is a g-l-flip 121(21)*-i012(12p = (21)^012(12)^'; 
If i = and j > there is a g-l-flip 1012 (12)^ 210(12)^; 
2: 21(21)^02(12)^' with i j > 0. 

If i > there is a g-l-flip 21(21)'02(12)^ = 1(21)^02(12)^; 
If i = and j > 1 there is a g-l-flip 210212(12)^-1 = 120(12)^; 
If i = and J = 1 the parent is 210212 = X^. 
Type I, even: these strings are also of type II, see below. 
Type II, odd: (2{0, 1})^2 has only parents of type II. 
Type II, even: 02({0, 1}2)* has possible parents starting with: 
2: 2({0, 1}2)* is of type II; 

1: 12({0, 1}2)*012({0, 1}2)* with three cases for a possible third 1: 

None: parent is 12(02)*012(02)*, which is of type III; 

Before 01: then there is a g-l-flip 

12({0,l}2)n 2({0, 1}2)*012({0, 1}2)* = 2({0, 1}2)*12({0, 1}2)*012({0, 1}2)*; 

After 01: then there is a g-l-flip 

12({0, 1}2)*012({0, 1}2)* 12({0, 1}2)* = 2({0, 1}2)*102({0, 1}2)*12({0, 1}2)*. 
Type III: 0(21)+02(12)* has possible parents starting with: 
1: (12)*01(21)^02(12)'^' with i > 0: 

It i>l there is a g-l-flip 12(12)'-i01(21)J02(12)'= = 2(12)^-101(21)^02(12)'=; 

If i = 1, j > there is a g-l-flip 120121(21)^-i02(12)'= = 21021(21)^-^02(12)'=; 

If i = 1, j = 0, /c> there is a g-l-fiip 120102(12)'= = 20102(12)*^; 

If i = 1, j = fc = then the parent is 120102 = X2 (relabelled); 
1: (12)+0(12)+0(12)+: there is a g-l-flip (12)+0 (12)+0(12)+ = 0(21)+20(12)+; 
2: 2(12)*0(21)+02(12)*: there is a g-l-fiip 2(12)"*0(21)+0 2(12)* = 0(12)+0(21)*2; 
2: (21)^20(12)^02(12)'= with j > 0: 

If i ^ 0, j = 1 then the parent is 210212 = Xi; 

If i+j >1 then (21)^201 2(12)^-^02(12)'= = 102(12)*+J'-i02(12)'= is a g-l-flip. □ 
The following theorem results directly from the above lemmas. 

Theorem 2. dg{s) — n~2 if and only if fully ternary string s of length n is bad and dg{s) = n — 3 
otherwise. Moreover, there exists a polynomial time algorithm for grouping ternary strings with a 
minimum number of flips. 

Proof. The first statement is direct from Lemmas |31 and 0] In case string s is bad, which by 
Definition ^ can be decided in polynomial time, the algorithm implicit in the proof of Lemma |21 
shows how to group s optimally in polynomial time. Otherwise, we repeatedly find a 1-flip to a 
good string as guaranteed by Lemma 0] The time complexity is 0(n''), since grouping distance, 
number of choices for a 1-flip, and time to perform a flip and test whether its result is good are 
all 0{n). □ 
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Parents 


Xi 


Parents 


Xr 


Parents 


210212 


1210212 


1201212 


21201212 


020210212 


2020210212 


0120212 = X3 
1201212 = Xa 


02101212 = X5 
21021212 


2020210212 


1202010212 




Parents 


21210212 


2012020212 
1201202012 
2021021212 


021012 


2021012 


X5 


Parents 


1201012 


02101212 


202101212 


1012012 
2010212 


120101212 


Xs 


Parents 


101201212 
210120212 


120120212 


2120120212 


X3 


Parents 


0210120212 


0120212 


10120212 


121012012 
202010212 


2102120212 
0210210212 
2021021212 


21020212 


20210212 = Xe, 
12021012 




Parents 


20210212 


020210212 = X7 
120210212 


2120210212 


20212012 


012020212 
120120212 = Xs. 



Table2. Type IV strings, their parents, and for each good parent, a l-flip to a good string. 



3.2 Grouping strings over larger alphabets 

Lemniasn]and|21say that n — k < dg{s) < n — 2 for any fully fc-ary string s. For any k there are fully 
fc-ary strings that have flip grouping distance equal to n — 2. For example the length n = 2{k — 1) 
string 1020 . . .{k — 1)0 requires for every l-flip to bring a to the front first and hence we need as 
many 0-flips as 1-flips, and dg(1020 . . . (fc — 1)0) > 2(fc — 2) = 2k — A = n — 2. Computer calculations 
suggest that for fc = 4 and k — b, for n large enough, the strings with grouping distance n — 2 
are precisely those having identical symbols at every other position, starting from the last (i.e. 
type II of Definition^. Proving (or disproving) this statement remains open, as well as finding a 
polynomial time algorithm for grouping fc-ary strings for any fixed fc > 3. We do, however, have 
the following intermediate result: 

Theorem 3. For every fixed fc there is a PTAS for grouping k-ary strings. 

Proof. We show that, for every fixed fc and for every fixed e > there is a polynomial-time algo- 
rithm that, given any fc-ary string s of length n, computes a sequence of flips which groups s in at 
most (1 -I- e)dg{s) flips. We assume fc > 4 because for k — 2 and fc = 3 the exact algorithms suffice. 
Let N — {k — 2) /e + k. We distinguish two cases. 

Case 1. If n > iV we use the simple, "greedy" algorithm described in the proof of Lemma El 
This will group s in d^{s) fiips with d^ < n — 2 steps. This together with the lower bound of 71 — fc 
on dg{s) from Lemma^gives dg{s) < dg{s) -I- (fc — 2) < (1 + e)dg{s). 

Case 2. If n < we compute dg{s) by a brute force algorithm which simply chooses the best 
amongst all possible flip sequences of length n — 2: there are n"^^ of these. This yields the optimal 
solution since dg{s) < n — 2 (Lemma |21l. The running time in this case is bounded by a constant. 
□ 

Clearly, there is a strong relationship between grouping and sorting. Understanding grouping 
may help us to understand sorting, and lead to improved bounds (especially as the length of 
strings becomes large relative to their arity), because for a fc-ary string s, we have dg{s) < ds{s) < 
dg{s) + wc{k), with wc{k) the flip diameter on permutations with fc elements, as defined before. 

Also dg(s) = min{(is(t) : t a relabeling of s}, which gives (for fixed fc) a polynomial time 
reduction from grouping to sorting. Thus every polynomial time algorithm for sorting by prefix 
reversals directly gives a polynomial time algorithm for the grouping problem (for fixed fc). 
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4 Sorting 



In this section we present results on sorting similar to those on grouping in the previous section. 
Also flip sorting distance remains open for strings over alphabets of size larger than 3. As an 
intermediate result we thus provide at the end of this section a PTAS for each such problem. 

Again a 1-flip brings identical symbols together and thus shortens the representative of the 
equivalence class under symbol duplication. But since symbol order matters for sorting, relabelled 
strings are no longer equivalent. As in grouping, sorting of binary strings is straightforward: 

Theorem 4. ds{s) = n — 2 for every fully binary string s of length n with s„ = 1, and ds{s) — n — 1 
otherwise. 

Proof. Exactly n — 2 1-flips suffice and are necessary to arrive at length 2 string 01 or 10. If the 
last symbol is an additional 0-flip is necessary putting a 1 at the end. All these flips can be f^^\ 
□ 

From Lemma n we know that dg(s) > n — 3 and hence ds{s) > n — 3 for every ternary string 
s of length n. In the upper bound on ds{s) we derive below we focus on strings s ending in a 2 
(s„ = 2), since sorting distance is invariant under appending a 2 to a string. It turns out that, 
when sorting a ternary string ending in a 2, one needs at most one 0-flip, except for the string 
0212. 

Lemma 5. ds{s) < n — 2 for every fully ternary string s of length n with s„ — 2, except 0212. 

Proof. It is easy to check that 0212 requires 3 flips to be sorted. By induction on n we prove the 
rest of the lemma. The basis case of n = 3 is trivial. For a string s of length n > 3 we distinguish 
three cases: 

- s„_i = 0: If s = 20102 it is sorted in 3 Hips: 20102 0102 ^ 102 ^ 012. Otherwise, by 
induction and relabeling <-!■ 2, the string si . . . s„_i can be reduced to 210 in n — 3 flips (to 
20 or 10 by TheoremQlif si . . . Sn-i has only two symbols), and one more flip sorts s to 012. 

— Sn-i = 1, Si = and appears only once: Thus s — 0(12)-'^ or s = 02(12)-'^. Then s can 
be sorted with only one 0-flip: 0(12)+12 l(21)+02 . . . ^ 2102 012 or, respectively, 
02(12)^2 ^ 20(12)+1 2 ^ (12)+ 102-^ .. . ^ 2102 ^ 012. 

^ Sn-i = Ij Si not unique: 

If s = 12012 then 3 flips suffice: 12012 21012 1012 012. 

Otherwise, since the other 2 parents of 0212 can flip to 1202, there is a 1-flip to a string 7^ 0212 
to which we can apply the induction hypothesis. □ 

As in Section |31 we characterise the strings ending in a 2 that need n — 2 rather than n — 3 
flips to sort. 

Definition 2. We define bad strings as all fully ternary strings ending in a 2 of the types: 
I. 0(12)5=2 

//. ({0,1}2)+ and 2{{0,l}2) + 

III. ({l,2}0)+2 and 0({1,2}0)+ 

IV. ({1,2}0)+12 and (0{1,2})+012 with at least two 2s. 
V. (01)*0212 and (10)+212 

VI. 1(20)+1(20)*2 and 0(21)+0(21)*2 
VII 1(02)+ 1(02)+ 
VIII 1(02)+ 12 
IX. 77 strings of length at most 11, shown in Tahle\^ 

All other fully ternary strings ending in a 2 are good strings. Strings of type I-VIII (I-strings ... 
VIII- strings for short) are called generically bad, or g-bad for short. 
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Vi = 


= 210212 


V21 


= 


10212012 






021202012 


161 


= 


0210212012 


Y2 -- 


= 021012 


Y22 




02121012 


I42 




021201012 


l62 




1021202012 


Y3 = 


= 212012 


Y23 




02120102 


I43 


— 


020210212 


^63 




1021201012 




= 120102 


Y24 




10102102 


Y44 




101020212 


l64 


= 


1020210212 


n = 


= 201202 


V25 




02010212 


Y45 




020212012 


Ybs 




1010210202 


n = 


= 0210202 


V26 


= 


21202012 


Kie 




212010202 


l66 


= 


0202010212 


V7 = 


= 1021202 


V27 




21201012 


147 




212012012 


l67 




2120202012 


Vs = 


= 0212012 


F28 




21201202 


l48 




010210212 


168 




2120102012 


V9 = 


= 2120102 


F29 




20210212 


I49 




010210202 


l69 




2021021212 




= 0102102 


V30 




01021202 


no 




010212012 


I70 




2010212012 




= 1212012 


Ysi 


= 


01020212 


^51 




202010212 


^71 


= 


1201021202 


yi2 


= 2010212 


V32 




20212012 


Y52 




121202012 


Y72 




1201202012 


^13 


= 0120212 


^33 




12120102 


^53 




121201202 


1^73 




10202010212 


Km 


= 1201012 


V34 




12010212 


X54 




201021202 


I74 




02120102012 


yi5 


= 1201212 


F35 




12010202 


I55 




120212012 


^75 




02021021212 




= 2012012 


^36 




20120102 


^56 




012021212 


1^76 




21201202012 




= 10210212 


F37 




12012012 


^57 




120102012 


I77 




12120202012 


^8 


= 21021212 


iliS 




021021202 






201202012 








Fw 


= 02102012 


F39 




102120102 


y;59 




120120212 








Y2Q 


= 02101212 


Yio 




102010212 


l60 




201201012 









Table3. Type IX strings 



This definition makes 0212 a bad string as well. From Lcmma[Slwe know that 0212 is the only 
ternary string ending in a 2 with sorting distance n — 1. 

Theorem 5. String 0212 has sorting distance 3. Any other fully ternary string s of length n with 
Sn — 2 has prefix reversal sorting distance n — 2 if it is bad and n — 3 if it is good. A fully ternary 
string s ending in a or 1 has the same sorting distance as s2. 

Proof. Directly from Lemmas El and below. Note that every sorting sequence for s sorts s2 as 
well while every sorting sequence for s2 can be modified to avoid flipping the whole string and 
thus works for s as well. □ 

Lemma 6. ds{s) = n — 2 for every bad ternary string s ^ 0212 of length n. 

Proof. Since ds{s) > n — 3 and any 1-flip decreases the length of the string by 1, Lemma [3 says it 
suffices to show that for each type in Definition [51 a 0-fiip is necessary. 

— For I-strings only 0-flips are possible. 

— A 1-fiip on a II- or Ill-string leads to a string of the same type, so that eventually no 1-flip is 
possible. 

— A 1-fiip on a IV-string leads either again to a IV-string or (when destroying the 12 suffix) to 
a Ill-string. 

— A 1-flip on a V-string leads either again to a V-string or (when destroying the suffix with a 
. . . 02 12 fiip) to a IV-string. Flips . . . 212 and . . . 021 2 arc not possible for lack of more 2's. 

— For strings of VI-, VII- and Vlll-strings only one 1-fiip is possible, leading to II-, III- and 
IV-strings respectively. 

— For IX-strings, Table 01 in the appendix lists all possible 1-fiips ultimately leading to a string 
of type LVIII. □ 

Lemma 7. ds{s) ~ n — 3 for every good ternary string s of length n. 

Proof. The proof is by induction on n and is similar to the proof of Lemma^ The induction basis 
for 71 = 3 is again trivial. We prove that for each g-bad string of length n all parents (of length 
n 1) are either bad or have a 1-flip to a string that is not g-bad (i.e. either good or of type IX). 
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Remember that such a flip is called a g-l-flip. That for each IX-string all parents are either bad 
or have a 1-flip to a good string is proved by case checking in Tabled (see appendix). Together 
this proves that every good string of length n + 1 has a 1-flip to a good string of length n and 
therefore the lemma. 

Type I: 0(12)+ has possible parents starting with: 
1: 1(21)*0(12)J with j > 0: 

If i > 0: there is a g-l-flip 121(21)*-iO(12)J = (21)*0(12)^'; 
If i = 0, J > 1: there is a g-l-flip 1012 (12)^'^ ^ 210(12)-'-i; 
If i ^ 0, J 1: there is a g-l-flip 1012 = 012; 
2: (21)*02(12)J with i > 0: 

If i > 1: there is a g-l-flip 2121(21)*-2o2(i2)J = l(21)'-i02(12)J; 
It i = l, j > 0: there is a g-l-flip 21021 2(12)^-^ = 120(12)^; 
If i = 1, j = 0: there is a g-l-flip 2102 = 012. 
Type II, even: ({0, 1}2)+ has possible parents starting with: 

0: 0(2{0, 1})*2102({0, 1}2)*, with three cases for a possible third 0: 
None: the parent is of type VI; 
Before 2102: there is a g-l-flip 

0(2(0, 1})*2 0(2{0, 1})*2102({0, 1}2)* = 2({0, 1}2)*0(2{0, 1})*2102({0, 1}2)*; 
After 2102: there is a g-l-flip 

0(2(0, 1})*2102({0, 1}2)* 02({0, 1}2)* = (2{0, 1})*2012({0, 1}2)*02({0, 1}2)*; 
1: 1(2{0, 1})*2012({0, 1)2)*, with three cases for a possible third 1: 
None: the parent is of type VI; 
Before 2012: there is a g-l-flip 

1(2{0,1})*2 1(2{0, 1})*2012({0, 1}2)* = 2({0, 1}2)*1(2{0, 1})*2012({0, 1}2)*; 
After 2012: there is a g-l-flip 

1(2{0, 1})*2012({0, 1}2)* 12({0, 1}2)* = (2{0, 1})*2102({0, 1}2)*12({0, 1}2)*; 
2: 2{0, 1}(2{0,1|)*2({0, 1}2)* is of type II. 
Type II, odd: 2({0, 1}2)+ has possible parents starting with: 
0: 0(2{0, 1})*202({0, 1}2)* is of type II; 
1: 1(2{0, 1})*212({0, 1}2)* is of type II. 
Type III, even: 0({1, 2}0)"'"2 has possible parents starting with: 
1: 1(0{1, 2})+010({l, 2}0)*2 is of type 111; 
2: 2(0{1, 2})+020({l, 2}0)*2 is of type III; 
2: 2(0{l,2})+02 is of type IIL 
Type III, odd: ({1,2}0)"'"2 has possible parents starting with: 
0: 0{1, 2}(0{1, 2})*0({1, 2}0)*2 is of type III; 

1: 1(0{1,2})*0210({1,2}0)*2, there are three cases for a possible third 1: 
None: the parent is of type VII; 
Before 0210: there is a g-l-flip 

1(0{1,2})*0 1(0{1, 2})*0210({1, 2}0)*2 = 0({1, 2}0)*1(0{1, 2})*0210({1, 2}0)*2; 
After 0210: there is a g-l-flip 

1(0{1, 2})*0210({1, 2}0)* 10({1, 2}0)*2 = (0{1, 2})*0120({1, 2}0)*10({1, 2}0)*2; 
2: 2(0{1, 2})*0120({1, 2}0)* 2~(0{1, 2})*0210({1, 2}0)*2 is a g-l-flip 

unless this last string is 02102 (type VI), but then the parent is 201202 — 1%; 
2: 2(0{1,2})*012 is of type IV. 
Type IV, even: ({1, 2}0)+12 with a second 2, has possible parents starting with: 
0: 0(1, 20)+12, with a second 2, is of type IV; 

1: 1(0{1,2})*0210({1,2}0)* 12 = (0{1, 2})*0120({1, 2}0)*12 is a g-l-flip; 
1: 1(0{1,2})*0212, with three cases: 

No third 2: the parent is of type V; 

No third 1: the parent is of type VIII; 

Otherwise: 1(0{1, 2})*0 1(0{1, 2})*0212 = 0({1, 2}0)*1(0{1, 2})*0212 (with a third 2) is 
a g-l-flip; 

2: 2(0{1,2})*0120({1,2}0)*12, with four cases: 
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A fourth 2 before 0120: there is a g-l-flip 

2(0{1. 2})' 2(0{1, 2})*0120({1, 2}0)*12 = 0({1, 2}0)+120({l, 2}0)*12; 
A fourth 2 after 0120: there is a g-l-flip 

2(0{1, 2})*0120({1, 2}0)* 20({1. 2}0)*12 = (0{1, 2})*0210({1, 2}0)+12; 
A third 1: 2(0{1. 2})"()12()({1. 2}())n 2 = 1(0{1, 2})*0210({1, 2}0)*2 is a g-l-flip; 
Otherwise: 2012012 = Yie; 
2: 21(0{1,2})*0 2(0{1, 2})*012 = 0({1, 2}0)*12(0{1, 2})*012 is a g-l-flip. 
Type IV, odd: 0({1,2}0)+12 with a second 2, has possible parents starting with: 
1: 1(0{1,2})+012, with a second 2, is of type IV; 
2: 2(0{1, 2})+012 is of type IV; 

2: 21(0{1,2})*0 2(0{1, 2})*02 = 0({1, 2}0)*12(0{1, 2})*02 is a g-l-flip. 

Type V, even: 0(10)"'"212 (0212 is also of type I), has possible parents starting with: 

1: (10)+212 is of type V 

1: 1201(01)*012 = 021(01)*012 is a g-l-flip; 

2: 2(01)+021 2 = 120(10)+2 is a g-l-flip; 
2: 212(01)+02 = 12(01)+02 is a g-l-flip. 
Type V, odd: (10)"''212, has possible parents starting with: 
0: (01)+0212 is of type V 
2: 2(01)+21 2 = 12(10)+2 is a g-l-flip; 

2: 212(01)*2 = 12(01)*2 is a g-l-flip unless i = 1, but then the parent is 212012 = Y3. 
Type VI, 1(20)+1(20)*2: has possible parents starting with: 
0: (02)n0(20)n(20)'=2 with i > 0: 

If i > 1: 0202(02)*-2io(20)n(20)'=2 = 2(02)^-il0(20)n(20)'=2 is a g-l-flip; 

If j = 1, j > 0: 02102 0f20V-H(20)^2 = 201(20)n(20)'^2 is a g-l-flip; 

If i = 1, j = 0, A; > 0: 0210120 (20) '="12 = 2101(20)'=2 is a g-l-flip; 

If i = 1, j = fc = 0: 021012 = Y2; 
0: (02)+l (02)+l(02)+ = 1(20)+21(02)+ is a g-l-flip; 

2: 2(02)*1(20)+1(20)* 2 = (02)*1(02)+1(20)*2 is a g-l-flip unless this last string is 10212 
(type V) or 0210212 (type VI), but then the parent is 212012 = Y3 or 21201202 = ¥23 
respectively; 

2: 2(02)*l(02)+l(20)+ 2 = (02)+l(20) + l(20)*2 is a g-l-flip; 

2: 2(02)*102(02)*12 = 01(20)*212 is a g-l-flip unless there is no second 0, but then the parent 
is 210212 = Yi. 

Type VI, 0(21)+0(21)*2: has possible parents starting with: 
1: (12)*01(21)^0(21)'^'2 with i > 0: 

If i > 1: 1212(12)'-2oi(21)JO(21)'=2 = 2(12)^-i01(21)J0(21)'=2 is a g-l-flip; 

If i = 1, j > 0: 120121(21)J-iO(21)'=2 = 210(21)J0(21)'=2 is a g-l-flip; 

If i = 1, j = 0, fc > 0: 120102 1(21)^-^2 = 2010(21)^=2 is a g-l-flip; 

It i = I, j = k = 0: 120102 = Ki; 
1: (12)+0 (12)+0(12)+ = 0(21)+20(12)+ is a g-l-flip; 

2: 2(12)*0(21)+0(21)* 2 = (12)*0(12)+0(21)*2 is a g-l-flip unless this last string is 1201202 

(type VI), but then the parent is 20210212 = Y29; 
2: 2(12)*0(12)+0(21)+ 2 = (12)+0(21)+0(21)*2 is a g-l-flip; 

2: 2(12)*012(12)*02 = 10(21)*202 is a g-l-flip unless this last string is 10202 (type III), but 
then the parent is 201202 = Y5. 
Type VII: 1(02)+1(02)+ has possible parents starting with: 

0: 0(20)*1(02)+1 (02)+ = 1(20)+1(02)+ is a g-l-flip; 

0: 0(20)*12 0(20)*1(02)+ = 210(20)*1(02)+ is a g-l-flip; 

2: (20)+12(02)*10 2(02)* = 01(20)*21(02)+ is a g-l-flip; 

2: (20)+l (20)+12"(02)* = 1(02)+012(02)* is a g-l-flip. 
Type VIII: 1(02)+12 has possible parents starting with: 

0: 0(20)n(02)n2 with j > 0: 

If j > 0: 020(20)'-il(02)n2 = (20)U(02)n2 is a g-l-flip; 
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If i = 0, J > 1: 0102 (02^-^12 = 201(02)^-112 is a g-l-flip; 

If i = 0, j = 1: 010212 is of type V; 
2: (20)+12(02)*1 2 = 1(20)*21(02)+ is a g-l-flip; 
2: 21(20)^2 witli i > 0: 

If i = 1: 212012 = Y3; 

If i > 1: 2120(20)'-il2 = 021(20)'-il2 is a g-l-flip. □ 
Theorem 6. There exists a polynomial time algorithm for optimally sorting ternary strings. 
Proof. Follows rather easily from Theorem O □ 

Finally, in light of the fact that the complexity of the sorting problem on quaternary (and higher) 
strings remains open, the following serves as an intermediate result: 

Theorem 7. For every fixed k there is a PTAS for sorting k-ary strings. 

Proof. The proof is very similar to the proof of Theorem 13 We assume that fc > 4. Let iV = 
(3fc — 2)/e + fc. Let s, the string we wish to sort, be of length n. We distinguish two cases. (In both 
cases it is useful to note that ds{s) < 2n because we can always bring the greatest symbol not yet 
in its final position to the front and then to its correct position.) 

Case 1. If n > N, we first group the string using the "greedy" algorithm from the proof 
of Lemma |21 which yields a permutation on k symbols. This permutation can then be eas- 
ily sorted with at most 2k flips. Thus the total number of flips, denoted by df{s), is at most 
(n — 2) + 2k. This together with the grouping lower bound of Lemma ^ of n — fc on ds{s) yields 
rff (s) < ds{s) + (3fc -2)<{1 + e)4(s). 

Case 2. If n < we apply brute force by selecting the shortest shorting sequence from among 
all length-2n sequences of flips; there are at most n^" such sequences. Given that ds{s) < 2n this 
is guaranteed to give an optimal solution. The running time in this case is bounded by a constant. 

□ 

5 Prefix reversal diameter 

Let S{n, k) be the set of fully fc-ary strings of length n. We define S{n, k) as the largest value of 
d{s, t) ranging over all compatible s, i G S{n, fc). 

Theorem 8. For all n>2, S{n, 2) = n - 1. 

Proof. To prove S{n, 2) > n — 1, consider compatible s, t £ S{n, 2) with s = (10)"/^ in case n even 
and s = 0(10)^"-i-'/^ in case n odd and in both cases t = I{s) i.e. t is the sorted version of s. By 
Theoremgl d{s,t) >n-l. 

The proof that S(n, 2) < n — 1, for all n > 2 is by induction on n. The lemma is trivially true 
for n — 2. Consider two compatible binary strings of length n: s = S1S2 . . • s„ and t = tit2 ■ ■ - tn- 
If Sn = tn then by induction d{s,t) < n — 2. Thus, suppose (wlog) = and tn = 1. If ti = 
then and s both end with a 0, and using induction and symmetry d{s, t) < 1 -I- dlf'^^H, s) < 
n — 2 + 1 = n — 1. An analogous argument holds if si = 1. 

Remains the case si = s„ = and ti = tn = 1. First, suppose tn-i = 0. Since s and t are 
compatible, there must exist index i such that = and s^+i = 1. Hence, /^"'■'(/'''^^■'(s)) ends 
with 01 like t and by induction d{s,t) < 2 + rf(/(") (/(*+i) (s)), i) = 2 + n - 3. Analogously, we 
resolve the case s„_i = 1. 

Finally, suppose s — 0...00 and t — 1...11. If s contains 11 as a substring, then flipping that 
11 (in the same manner as above) to the back of s using 2 flips, gives two strings that both end 
in 11. Alternatively, if s does not contain 11 as a substring then s has at least two more O's than 
I's, which implies that t must contain 00 as a substring. In that case two prefix reversals on t 
suffice to create two strings that both end with 00. In both cases, the induction hypothesis gives 
the required bound. □ 
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Note that, trivially, d{s,t) < 2n for all compatible s,t G S{n, k), for all k, because two prefix 
reversals always suffice to increase the maximal common suffix between s and t by at least 1 . The 
following tighter bound gives the best bound known on the diameter of ternary strings. 

Lemma 8. For any two compatible s, i 6 S{n, k), for any k, let a be the most frequent symbol in 
s and a its multiplicity. Then d{s,t) < 2(n — a). 

Proof. We prove the lemma, by induction on n. The lemma is trivially true for n — 2. Consider 
s,t e S{n,k). If s„ = tn = a then siS2 ■ ■ . Sn-i and tit2 ■ ■ .tn~i are compatible length-(n — 1) 
strings where the most frequent symbol occurs at least a — 1 times. Thus, by induction d{s, t) < 
2 ((n — 1) — (a — 1)) = 2{n — a). In case Sn = tn ^ a induction even gives d{s,t) < 2((n — 1) — (a)) = 
2{n — a) — 2. Thus, suppose s„ ^ t„ implying wlog that t^ ^ b ^ a. Suppose Si — b; after two 
flips s' = /^"''(/^^•'(s)) has 6 at the end; — <„. Moreover the length n — 1 suffixes of s' and t still 
contain a a's. Hence by induction d{s, t) < 2 + d{s' , t) < 2 + 2{{n — 1) — a) = 2{n — a). □ 

Lemma 9. For all n > 3, n ~ 1 < d{n, 3) < (4/3)n. 

Proof. Since in any ternary case a > \n/3], Lemma |H1 implies S{n, 3) < (4/3)n. To prove S{n, 3) > 
n — I we distinguish between n is odd and n is even. For odd n = 2/i + 1, let s be 2(01)'', and for 
even n = 2h let s — 01(21)''"^. In both cases we let t — I{s). We observe that, in the even and 
in the odd case, s2 is a bad I-string and a bad IV-string, respectively, in the sense of Deflnition|21 
Thus, by Theorem|Slwe have that d{s, t) ~ d{s2, t2) = {n + 1) — 2 ~ n — 1. (Here s2, respectively 
i2, refers to the concatenation of s, respectively t, with an extra 2 symbol.) □ 

Brute force enumeration has shown that, for 4 < n < 13, d{n, 3) = n — 1. (Note that (5(3, 3) = 3 
because d(021,012) = 3.) Proving or disproving the conjecture that S{n,3) = n — 1 for n > 3 
remains an intriguing open problem'^. 

6 Prefix reversal distance 

We show that computing flip distance is NP-hard on binary strings. We also point out, using a 
result from |12| . that computing flip distance on arbitrary strings is polynomial-time reducible (in 
an approximation-preserving sense) to computing it on binary strings. 

Theorem 9. The problem of computing the prefix reversal distance of binary strings is NP-hard. 

Proof. We prove NP-completeness of the corresponding decision problem; 
Name: BINARY-PD (2PD shortly) 

Input: Two compatible strings s, i G S{n, 2), and a bound B G Z+. 
Question: Is d{s,t) < Bl 

2PDeNP, since a certificate for a positive answer consists of at most B flips'*. To show completeness 
we use a reduction from 3-Partition |7| (cf. [2| and ^). 
Name: 3-Partition (3P shortly) 

Input: A set A = {oi, 02, 03^} and a number N e Z+. Element Oi has size r{ai) G Tj^ satisfying 
N/A < r(a,) < N/2, i = 1, . . . , 3fc, and X;»=i r{a^) = kN. 

Question: Can A be partitioned into k disjoint triplet sets Ai, A2, Ak such that X^asA ^('^) ~ 



Given instance / — {A, N, r) of 3P, we create an instance of 2PD by setting B ^ 6k and building 
two compatible binary strings s and t: 



^ Interestingly, initial experiments with brute force enumeration have also shown that, for 4 < n < 10, 

5{n, 4) = n, and for 5 < n < 9, S{n, 5) = n. 
* Recall that for all compatible strings s,t £ S{n,2), trivially d{s,t) < 2n. 



N,j = l,...,k? 
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This construction is clearly polynomial in a unary encoding of the 3P instance; we use the strong 
NP-hardness of 3P [7j. We claim that / = {A, N, r) is a positive instance of 3P <^ (i(s, t) < 6k. 

Let aij denote the jth element from triples Ai (in arbitrary order), j — 1,2,3, i = 1, . . . 
and let us abuse its name also to denote the corresponding 1-block of length r{aij) in s. 

That s can be transformed to t in 6fc flips follows directly from the correctness of the following 
claim for h — k. 

Claim. For < h < k, s can be transformed into a string i/jh = ati^Jh in h phases, each consisting 
of 6 flips, where tph has the following specific properties: 

(1) The suffix (i.e. ujh) is equal to (01^)'' and contains all 3h 1-blocks corresponding to the ele- 
ments in u'j^iAj-, 

(2) The prefix (i.e. ah) contains the remaining 3{k — h) 1-blocks, each of them flanked by 0-blocks 
of length at least 3, except possibly a 0-block of length 2 at its right end. (Given that iph — ahOJh 
it follows that, in i/j^, all these remaining 1-blocks are flanked by 0-blocks of length at least 3.) 

Proof. The proof is by induction. First we transform s into "01 in 6 flips: flips 1 and 2 bring 
flu to the back, flips 3 and 4 bring ai2 to the back (just in front of an) and flips 5 and 6 bring ai3 
to the back (just in front of 012). No 0-blocks are cut in this process, and only 1-blocks aii,ai2 
and ai3 are affected (i.e. concatenated into a single length- iV 1-block). 

Now, suppose by induction that after 6{h — 1) flips we have created "0/1-1- The next 6 flips 
(which form phase h) work exclusively on au-i- Flips 1 and 2 bring aui to the front and then to 
the back of au-i] flips 3 and 4 bring a/12 to the front and then to the back just in front of aui] 
flips 5 and 6 bring a/13 to the front and then to the back just in front of ah2- These 6 flips (which 
do not cut any 0-blocks within ah-i)^ thus transform ah-i into a string with 01^ at the suffix, 
which appended to uJh-i gives a suffix equal to loh. The only question is whether the resulting 
overall string satisfies condition (2). The only obstacle to this is the possible length-2 0-block at 
the end of a^-i. However, this block is not flipped in flip 1 of phase h, it is brought to the front in 
flip 2, and concatenated to another 0-block in flip 3, leaving the preflx string without a length-2 
0-block. This completes the proof of the claim. 

<;=) Suppose that / is a negative instance of 3P. We show that (i(s, t) > 6fc. Notice that if / is not 
a positive instance then in any sequence of flips taking s to i some flip must split a 1-block i.e. 
^^1.... Below we add this to a list of tasks that any sequence of fiips taking s to i must complete: 

(0) split at least one 1-block; 

(1) reduce the number of 1-blocks by 2fc; 

(2) bring a 1 symbol to the end of the string (because t ends with a 1, but s does not); 

(3) increase the number of singleton 0-blocks by fc — 1; 

(4) reduce the number of big (i.e. of length at least 3) 0-blocks by 3fc. 

To prove that at least 6k + 1 flips are needed to complete tasks (0)-(4), we show that flips which 
make progress towards completing one of the tasks can not effectively be used to make progress 
on another task. From this it follows that at least 1 -I- 2fc -I- 1 -I- (fc — 1) -I- 3fc = G/c -I- 1 flips will be 
needed. 

It is immediately clear that task (2), requiring a flip of a whole string, cannot be combined 
with any of the other tasks in one flip. Notice that any task(0)-flip (which is of the form 1...1 1... or 
of the form 0...1 1...) does not decrease the number of 1-blocks, while 0-blocks remain unaffected. 
So such flips do not contribute to tasks (l)-(4). Nor can any task(l)-flip (which is always of the 
form 1...0 1...) contribute to any of the other tasks from the list. It is also not too difficult to verify 
that it is not possible to reduce the number of big blocks by 2 or more in one flip. However, some 
types of task(3)-flip can at the same time also contribute to task (4), and some other types of 
task(3)-flip can increase the number of singleton 0-blocks by two, effectively contributing 'twice' 
to task (3). Such flips we call (34)- and (33)- flips, respectively. We will show that all (34)- and 

^ Observe that, in terms of its action on the overall string, flip 2 of phase h does cut a 0-block, cutting 
ah~i from uJh~i, creating the singleton 0-block in between two length TV 1-blocks. 
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(33)-flips necessarily have to be succeeded by at least one flip that docs not, in an overall sense, 
help us with the completion of the tasks. 
Any (33)-flip is of the type 

(33.1) 1...0 0... (where the Os form a complete block) 
Any (34)-flip is of the type: 

(34.1) 1...0 00... (where the Os form a complete block) 

(34.2) 1...00 0... (where the Os form a complete block) 

(34.3) 000.. .10 00... 

We emphasize here that GO is not considered to be a big 0-block. 

After a flip of type (33.1), (34.1) or (34.3) we have a single at the front. In such a situation 
a task(l)- or task(2)- flip is not possible. We cannot perform a task(3)-flip because flips of the 
form 01. .. 0... will destroy the initial singleton 0, and flips of the form 01. .. 1... cannot create new 
singleton Os. The only task(4)-flip possible is 01. ..OO P... (where the second group of Os forms a 
complete block) but this also reduces the number of singleton O-blocks by 1, meaning that an 
extra task(3)-flip would then be needed. Termination is not an option (because t does not begin 
with 01). A task(0)-flip of the form 01. ..1 1... is potentially possible but, as noted, this increases 
the number of required task(l)-flips. 

After a flip of type (34.2) we are left with 001 at the front. Again, a task(l)- or task(2)- flip 
is not possible in this situation, and neither is termination. A task(3)-flip is potentially possible 
but this brings a single to the front, which (by the earlier argument) cannot be followed by any 
useful flip. A task(4)-flip is not possible because, when the string begins with 001, a task(4)-flip 
must necessarily split a 00-adjacency in some big 0-block, but this simply creates a different big 
0-block. □ 

For studying problems on arbitrary strings, let X and Y be two compatible, length-n strings, where 
we assume (wlog) that each of the symbols from X and Y are drawn from the set {0, 1, n — 1}. 
We deflne D{X,Y) as the smallest number of flips required to transform X to Y. The arity of 
the strings X and Y does not need to be flxed, and symbols may be repeated. Hence, sorting of a 
permutation by flips (MIN-SBPR), and the flip distance problem over flxed arity strings, are both 
special cases of computing D. Given that computing D is a generalisation of computing distance d 
of binary strings, immediately implies that it is NP-hard. However, an approximation-preserving 
reduction in the other direction is possible, meaning that inapproximability results for one of the 
problems will be automatically inherited by the other. 

Theorem 10. Given two compatible strings X and Y of length n with each symbol from X and 
Y drawn from {0, 1, ...,n—l}, it is possible to compute in time polynomial in n two binary strings 
X and y of length polynomial in n such that D{X, Y) = d{x, y). 

As demonstrated shortly the above result follows directly from work by Radcliffe, Scott and 
Wilmer. A little background is necessary to understand the context. In Theorem 8 of it 
is shown that sorting permutations by reversals is directly reducible to the reversal distance prob- 
lem on binary strings. It is later argued (in Theorem 11 of |12j') that the same reduction technique 
can be used to reduce the transposition distance problem on a 4-ary alphabet to the transposition 
distance problem on a binary alphabet. The proof of Theorem 11 lacks detail but personal com- 
munication with the authors ^21 has since clarified that the result is correct. Furthermore, the 
reduction technique underpinning Theorems 8 and 11 from |12| can be directly applied to prove 
the present theorem. We show this by reproducing the reduction technique (complete with clari- 
fication) in the context of prefix reversals. We also use this opportunity to clarify the correctness 
of Theorem 11 from The following should thus be considered attributed to Radcliffe, Scott 
and Wilmer. 

Proof. The strings x and y are constructed as follows: 

X= (10^i + ll)2»+l...(10^" + ll)2»+l 
y = (10^i+ll)2»+l...(10^'. + ll)2"+l 
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In the above encoding, each symbol Xi is thus encoded as the fragment (lO'^'+^l)^""'"^, each 
fragment consisting of 2n + 1 subfragments. (This also holds for each symbol in Y .) Note that a 
fragment is reversal invariant. To see that y) < D{X, K), observe that - by mapping to prefix 
reversals that cut at the boundaries between fragments - any sequence of m prefix reversals taking 
X to Y can be trivially mapped to m prefix reversals which take x to y. 

The proof that D{X,Y) < d{x,y) is more involved. Combining d{x,y) < D{X,Y) with the 
trivial fact that D{X,Y) < 2n yields d{x,y) < 2n. Now, consider any shortest sequence of pre- 
fix reversals taking x to y. This sequence of prefix reversals will cut the string x in at most 2n 
places. A subfragment within x is said to survive iff it is not cut by any of these prefix reversals. 
Now, construct a bipartite graph with vertex set {ei, 62, e„} U {/i, /2, fn} and add an edge 
{ei, fj) iff some subfragment of the fragment corresponding to Xi survives and ends up in the 
fragment corresponding to Yj. Observe that within any set of m fragments from x, strictly more 
than (to — l)(2n + 1) subfragments will survive, and hence at least to fragments from y will be 
required to absorb these surviving subfragments. Thus, by Hall's Theorem, the graph has a perfect 
matching. For each edge (ei, fj) of the perfect matching, pick a subfragment from the fragment 
corresponding to Xi that survives and ends up in the fragment corresponding to Yj . Considering 
the action of the flips only on these n subfragments, we see that there exists a sequence of d{x, y) 
prefix reversals transforming the sequence of symbols in X into the sequence of symbols in Y , and 
thus D{X,Y) < d{x,y). □ 

The correctness of Theorem 11 from [T^ follows by using the same reduction but encoding each 
fragment as 3n subfragments rather than 2n + 1 subfragments. (The transposition distance be- 
tween two compatible length-n strings is strictly less than n, and a transposition cuts a string in 
at most 3 places.) Indeed, it is easy to see that the reduction works for a whole family of string 
rearrangement operators, by ensuring that the number of subfragments per fragment is sufficiently 
large. For example, consider a rearrangement operator op, and let u be some upper-bound on the 
number of places an op-operation can cut a string. Let v be any upper bound on the maximum 
value of dop{X, Y) ranging over all compatible length-n strings X, Y. Encoding each fragment with 
uv + 1 subfragments is sufficient to generalise the above reduction. 

7 Open problems 

In this study we have unearthed many rich (and surprisingly difficult) combinatorial questions 
which deserve further analysis. We discuss some of them here. The main unifying, "umbrella" 
suggestion is that, to go beyond ad-hoc (and case-based) proof techniques, it will be necessary to 
develop deeper, more structural insights into the action of flips on strings over fixed size alphabets. 

Grouping and sorting on higher arity alphabets. We have shown how to group and sort 
optimally binary and ternary strings, but characterisations and algorithms for quaternary (and 
higher) alphabets have so far evaded us. As observed in Section it seems that for A: = 4, 5 and 
for sufficiently long strings, the strings with grouping distance n—2 settle into some kind of pattern, 
but this has not yet offered enough insights to allow either the development of a characterisation 
or of an algorithm. Related problems include: for all fixed fc, are there polynomial algorithms to 
optimally sort (optimally group) k-ary strings? Is grouping strictly easier than sorting, in a com- 
plexity sense? How does grouping function under other operators e.g. reversals, transpositions? 
An upper bound on the grouping transposition distance has been presented in 

Diameter questions. Proving or disproving that S{n, 3) = n — 1 for n > 3 remains the obvious 
open diameter question. Beyond that, diameter results for quaternary and higher arity alphabets 
are needed. How does the diameter S{n, k) grow for increasing fc? (At this point we conjecture 
that, for sufficiently long strings, the diameter of 3-ary, 4-ary and 5-ary strings is n — 1, n, and n 
respectively.) 

The suspicion also exists that, for all k and for all sufficiently long n, there exists a length-n 
fully fc-ary string s such that c?(s, I{s)) — 5{n, fc). In other words, the set of all pairs of strings that 
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are S{n, k) flips apart includes some instances of the sorting problem. It should be noted however 
that, following empirical testing, it is apparent that there are also very many pairs of strings s,t 
with s ^ I{t) and t ^ J(s) that are 5[n, k) flips apart. 

It also seems important to develop diameter results for subclasses of strings, perhaps (as in 
characterised by the frequency of their most frequent symbol. It may be that such refined diameter 
results for fc-ary alphabets provide information that is important in determining 5{n, k + 1). 

Note finally that the diameter of strings over fixed size alphabets, i.e. 5{n, fc), is always bounded 
from above by the diameter of permutations, wc{n). This is because the distance problem on two 
length-n, fixed size alphabet strings s, t can easily be re- written as a sorting problem on a length-n 
permutation tt, such that a sequence of prefix reversals sorting the permutation also suffices to 
transform s into t. Indeed, because of this relabelling property, the flip distance between two fixed 
size alphabet strings can be viewed as being equal to the minimum permutation sorting distance, 
ranging over all such relabellings into a permutation tt. Can this relationship between the fixed 
size alphabet and permutation world be further specified and exploited? 

Signed strings. The problem of sorting signed permutations by flips (the burnt pancake flip- 
ping problem) is well known [H| 501) but in this paper we have not yet attempted to analyse 
the action of flips on signed, flxed size alphabet strings. Obviously, analogues of all the problems 
described in this paper exist for signed strings. 

Complexity/approximation. In the presence of hardness results (e.g. Theorem it is inter- 
esting to explore the complexity of restricted instances, and to develop algorithms with guaranteed 
approximation bounds. For example, gives a PTAS for dense instances. The development of 
approximation algorithms is also a useful intermediate strategy where the complexity of a problem 
remains elusive. In particular, this requires the development of improved lower bounds. 
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C3 = Ysi 

P8 : 20102020212 


P4 : 20202010212 
P9 : 12010202012 


C5 = Fsi 

PIO : 21201020202 


P6 = Y73 


Y67 = 2120202012 


P2 = F77 

C7 = Y4, 


C3 is of type VI 
P8 : 02020212012 


P4 : 02120202012 

P9 : 10202021212 


C5 = Y41 

CIO is of type VIII 


P6 : 02021202012 


Yes = 2120102012 


P2 : 12120102012 


C3 = FsT 

P8 : 02010212012 


P4 = Y74 

P9 : 10201021212 


P5 : 10212102012 
CIO = Y40 


P6 : 01021202012 


C7 = Fso 




^69 = 2021021212 


P2 = Y75 

P7 : 12012021212 


C3 is of type VI 
C8 = Y59 


P4 : 12021021212 
P9 : 12120120212 


P5 : 01202021212 
CIO = Fss 


C6 = Fse 


Y70 = 2010212012 


P2 : 02010212012 


P3 : 10210212012 


P4 : 01020212012 
P9 : 10212010212 


C5 = Xso 
CIO = Y39 


P6 : 12010212012 


C7 = FsT 


P8 = Y74 




Yn = 1201021202 


P2 : 21201021202 


P3 : 02101021202 


C4 = Yss 

P9 : 02120102102 


P5 : 01021021202 
PIO : 20212010212 


P6 : 20102121202 


C7 = Ysi 


P8 : 21201021202 




Y72 = 1201202012 


P2 = Y7S 

P7 : 20210212012 


P3 : 02101202012 
P8 : 02021021012 


C4 = Y41 
C9 = Y43 


P5 : 21021202012 
PIO : 21020210212 


P6 : 02102102012 


Yys = 10202010212 


P2 : 010202010212 


P3 : 201202010212 


PA : 020102010212 
P9 : 201020201212 


P5 : 202012010212 
CIO is of type IV 


P6 : 020201010212 
Pll : 212010202012 


C7 = Fee 


P8 : 010202010212 


Y74 = 02120102012 


P2 : 202120102012 


P3 : 120120102012 


P4 : 212020102012 
C9 = Y70 


C5 = Fss 

PIO : 102010212012 


P6 : 102120102012 
Pll : 210201021202 


C7 = Ye2 


P8 : 201021202012 


Yts = 02021021212 


P2 : 202021021212 
P7 : 201202021212 


C3 = Yes 

PS : 120120201212 


P4 : 202021021212 
P9 : 212012020212 


P5 : 120201021212 
PIO : 121201202012 


C6 is of type II 
Pll : 212120120202 


Y76 = 21201202012 


P2 : 121201202012 


C3 = Y72 
CS = Yei 


P4 : 021201202012 


P5 : 102121202012 

PIO : 102021021212 


C6 = Ye2 

Cll = Ym 


P7 : 021021202012 


P9 : 020210212012 


Y77 = 12120202012 


P2 : 212120202012 
P7 : 02()2i2i()2()i2 


C3 = Yn 

PS : 202021212012 


P4 : 212120202012 
i'9 : 020202121012 


PS : 021210202012 
CIO IS of type 11 


P6 : 202121202012 
i'li : 210202021212 



Table4. All strings of type IX (first column). For each string all parents and all 1-flips are listed. 

Each parent is either bad or a 1-flip to a good string is given. For each string of type IX is also 
shown that each 1-flip leads to a bad string. Here Pi denotes the parent you get by doubling the 
i-th symbol and applying Ci denotes the string you get by applying the 1-flip p{i — 1). Note 
that if the i-th symbol is not equal to the first symbol there is a parent Pi and if the i-th symbol 
is equal to the first symbol there is a 1-flip possible, leading to Ci. 
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